Skip to content

support gemma4 use_hybrid_sliding#129

Open
Jintao-Huang wants to merge 1 commit into
modelscope:mainfrom
Jintao-Huang:support_gemma4_use_hybrid_sliding
Open

support gemma4 use_hybrid_sliding#129
Jintao-Huang wants to merge 1 commit into
modelscope:mainfrom
Jintao-Huang:support_gemma4_use_hybrid_sliding

Conversation

@Jintao-Huang

Copy link
Copy Markdown
Collaborator

No description provided.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a hybrid attention mode for Gemma4 in multimodal scenarios, allowing full attention layers to fallback to unfused attention with arbitrary masks while sliding layers utilize the configured backend with native sliding windows. The feedback recommends safely accessing the attention backend name using getattr to prevent potential AttributeErrors. Additionally, it is suggested to replace the mm_token_type_ids is None check with a more robust has_vision flag to correctly handle text-only batches that contain zero-valued token type tensors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

# In hybrid mode (backend != unfused), only full attention layers need arbitrary mask;
# sliding layers keep causal mask and use native flash sliding window.
# When backend is unfused, all layers use arbitrary mask.
if config.attention_backend.name == 'unfused' or not self.is_sliding:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To prevent potential AttributeError if config.attention_backend is None or does not have a name attribute (e.g., in certain testing or custom configurations), use getattr to safely retrieve the backend name.

Suggested change
if config.attention_backend.name == 'unfused' or not self.is_sliding:
if getattr(config.attention_backend, 'name', None) == 'unfused' or not self.is_sliding:

'errors in multimodal scenarios. Please always pass pure text data.')
text_config.use_bidirectional_attention = None
else:
if config.attention_backend.name == 'unfused':

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Safely access the attention_backend name using getattr to avoid potential AttributeError if the backend is not configured or is None.

Suggested change
if config.attention_backend.name == 'unfused':
if getattr(config.attention_backend, 'name', None) == 'unfused':

Comment on lines +539 to +540
self.hybrid_attention_mode = (
text_config.use_bidirectional_attention == 'vision' and config.attention_backend.name != 'unfused')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Safely access the attention_backend name using getattr to avoid potential AttributeError if the backend is not configured or is None.

        self.hybrid_attention_mode = (\n            text_config.use_bidirectional_attention == 'vision' and getattr(config.attention_backend, 'name', None) != 'unfused')

sliding_attention = sliding_attention | window_mask
# In hybrid mode, only skip sliding mask when no vision tokens (pure text batch).
# When vision tokens are present, fall back to all-unfused behavior for correctness.
use_hybrid_sliding = self.hybrid_attention_mode and mm_token_type_ids is None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If mm_token_type_ids is not None but contains no vision tokens (e.g., a tensor of all zeros, which is common for text-only batches in multimodal pipelines), checking mm_token_type_ids is None will incorrectly evaluate to False. This disables hybrid sliding and falls back to the slower manual sliding window mask. We should check if there are actually any vision tokens present in the batch.

        has_vision = mm_token_type_ids is not None and (mm_token_type_ids > 0).any()\n        use_hybrid_sliding = self.hybrid_attention_mode and not has_vision

attention_mask['sliding_attention'] = sliding_attention
attention_mask['full_attention'] = full_attention
# Signal sliding layers to override attn_mask_type to arbitrary in hybrid mode with vision
attention_mask['_sliding_override_arbitrary'] = self.hybrid_attention_mode and mm_token_type_ids is not None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the has_vision boolean flag defined earlier to avoid overriding the attention mask to arbitrary when there are no actual vision tokens in the batch.

Suggested change
attention_mask['_sliding_override_arbitrary'] = self.hybrid_attention_mode and mm_token_type_ids is not None
attention_mask['_sliding_override_arbitrary'] = self.hybrid_attention_mode and has_vision

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant